DOCUMENT RESUME 



ED 069 737 



TM 002 192 



AUTHOR 

TITLE 

INSTITUTION 

REPORT NO 
PUB DATE 
NOTE 



Larsson, B. 

An Experimental Study of the Efficiency of Human 
Information Processing. 

School of Education, Malrao (Sweden) . Dept, of 
Educational and Psychological Research. 

R-35 
Jul 72 
53p. . 



EDRS PRICE 
DESCRIPTORS 



MF-$0 • 65 HC-$3.29 

Bayesian Statistics; *Cognitive Processes; Hypothesis 
Testing; Information Processing; ^Mathematical 
Models; ♦Measurement Techniques; *Neurological 
Organization; Sampling; Statistical Analysis 



ABSTRACT 

An experimental study of the efficiency of human 
information processing is based on the Bayesian model for simple 
hypothesis testing with fixed binomial sampling. Each of 60 subjects 
is analyzed with separate ANOVAs focusing on two efficiency 
variables. Sample size and critical value are also analyzed. Subjects 
show very different utilization of the independent variables 
diagnosticity, prior probability and loss, both for their choices and 
their efficiency of the choices. Giving a part of the experiment as a 
group test generates similar efficiency results. Efficiency does not 
seem to be related to intelligence. Final comment connects the 
experiment with the lens model. . (Author/LH) 
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AN EXPERIMENTAL STUDY OF THE EFFICIENCY OF HUMAN 
INFORMATION PROCESSING 



Bernt Larsson 




This study is based on the Bayesian model for simple hypothesis 
testing with fixed binomial sampling. Each of 60 subjects is analysed 
with separate ANOVAs focusing on two efficiency variables. Sample 
size and critical value are also analysed. Subjects show very different • 
utilization of the independent variables diagnosticity, prior probability 
and loss, both for their choices and their efficiency of the choices. 
Giving a part of the experiment as a group test generates similar 
efficiency results. Efficiency does not seem to be related to intelligence. 
Final comment connects the experiment with the lens model. 
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INTRODUCTI ON 

This study deals with behavioural decisions and has its theoretical 
anchoring within Bayesian decision theory. (See e. g. DeGroot, 1970 
and Pratt, Raiffa & Sclilaifer, 1965. ) Although it can sometimes be 
meaningfully used by those preferring orthodox statistics, Bayes's 
theorem is a central point for Bayesians. It seems therefore natural that 
a substantial proportion of Bayesian research is directly concerned with 
this theoi em, e. g. in the form of probability revision experiments. 

Another substantial proportion is interested in choices of actions and 
different expected utility theories. While Bayes's theorem tells you how 
to produce new pr obabilities when new information reaches you, theories 
of expected utility tell you how to use them for decision making. One pro- 
Bayesian research, which takes both points into consideration, 
has been labelled information seeking experiments 

Such experiments can involve sequential sampling, fixed sampling or 
both. Sequential sampling provides the experimenter with more information 
about subjects than fixed sampling does, but it is as a rule more laborious 
to perform. Also, if one wants to connect behaviour with statistical 
theories, these are more complex for sequential sampling than for their 
fixed sampling equivalents or may even be nonexistent. The most used 
sampling model, sequential or not, is the binomial one. Two others have 
been used with some frequency, viz. the multinomial and the normal 
model. When these models arc; used in information seeking experiments, 
they almost always arc connected with simple hypothesis testing, while 
more complex hypothesis testing and point estimation are rare. 

Like other fixed sampling models for simple hypothesis testing, the 
binomial model has three determinants. They arc the diagnosticity of 
data, the prior probabilities and the losses, where only the first one is 
directly related to rhe binomial model, while the other two arc provided by 
Bayesian decision theory in order to complete it. Diagnosticity can and 
has been measured in many ways, both by statisticians and behavioural 
scientists, and is a measure of how much one observation can discriminate 
between the hypotheses. It is a function of the difference between the 
parameter values stated by the two hypotheses. The prior probabilities 
arc the probabilities of the hypotheses , prior to sampling. Losses comprise 
the "economic" outcome of the choice of a hypothesis and the cost of 
sampling. Information seeking experiments do not often vary all throe 
determinants simultaneously. 



a he experiment of this report is mainly chosen to illustrate some new 
dependent variables. It seems then reasonable to select an experiment 
which is common within a suitable kind of Bayesian research. Therefore, 
an information seeking experiment varying all three determinants, as 
described in the last paragraph, was chosen. However, this study con- 
centrates on the consequences of the subjects decisions and not on how 
he chooses them, which seems to be of overwhelming interest in the 
reports issued hitherto. It does not mean that choices are neglected here: 
two choice variables and two consequence variables will be used as 
dependent variables. 

Although Bayesian experiments seldom hinder you from showing a 
considei able mathematical machinery, I have not felt this to be necessary 
or even desirable, so the mathematics are kept at a minimum. This 
goes also for data presented. As an unusual feature this study presents 
hardly any tables (some can be found in appendices) but instead presents 
important data directly in the text. This may irritate some readers, but 
it has two distinct advantages. It reduces the number of pages and you 
can read continuously without interrupting yourself by looking at tables, 
which perhaps contain only some data of interest for you. 

The experiment has an ,5 appendix n : the group testing which comprises 
one decision test and ten intelligence tests. The purpose of this addition 
is to sec whether intelligence is related to efficiency of decision making 
and whether a group test for decision making can give information equiva- 
lent to that of the more expensive experiment. Both the experiment and 
the tests arc discussed in the next section. Although one may argue about 
how to present individual results, perhaps because we are not so used to 
these as to group results, I hope that- nobody regards them as unimpor- 
tant. I personally find them at least as important as group results and 
therefore present several individual results. The final comment makes 
use of Brunswik's lens model, which I think is a beautiful research 
paradigm, capable of many applications. 

The main questions of this study may thus be put in the following way: 

1. How is the choice of the number of observations related to diagnos- 
ticity, prior probability and loss? 

2. How are the hypotheses chosen? 

3. How is the efficiency of the choices related to diagnosticity , prior 
probability and loss? 
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4. How is efficiency related to intelligence? 

5. How arc the efficiency results of the experiment related to those of 
the decision test? 

6. How much do the group results mirror the individual results? 
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DATA COLLECTION 

Data have been gathered at two different sessions, labelled the experi- 
ment and the group testing. The experiment is factorially designed and 
refers,. in some degree, to realistic decision situations. The group 
testing involves intelligence tests and a modified third of the experiment, 
given as a group test and referring to more hypothetical decision situa- 
tions. Thus, there arc possibilities for comparing individual behaviour 
in hypothetical and in less hypothetical decision situations and connecting 
this behaviour with intelligence. Several dependent variables will be 
used and they also comprise comparisons with optimal behaviour. 

The experiment 

The experiment uses the statistical model for simple hypotheses testing 
with binomial sampling. Every situation can be described in the following 
way: There is an infinite set H with two kinds of element, Hq and H^. 

These elements are in turn infinite sets with elements x which arc either 
0 or 1 and constitute the observations. The experimenter draws randomly, 
with probabilities P (K.), an element from H and from this element the 
subject draws randonly n observations. The observations are indepen- 
dently and identically distributed with P (xj = i/jK.) = p?. (pq< p^) so that 

k = Zx is binomially distributed. In common statistical language H. is 

1 j 

called '''hypothesis" and P (H.) "prior probability". 

The subject must make two decisions: a cl.oicc of n and a choice of 
H-. The latter could be guided by the outcome of the observations. A 
wrong choice of H. implies a monetary loss c., while a correct choice 
gives zero loss, and every observation must be paid with one unit of 
the c-scale. The commonest definition of optimal choice of H. will give 
an expected loss L = min (c.P ./!•:, n;}. This so-called Bayes strategy 
means that Hq is chosen if k kg and otherwise is chosen. The 
Bayes value kg is calculated from the equation CQP(Hj/k,nl = c^P(HQ/k,h). 
Finally, the optimal choice of n is such that It = min (L + n) is obtained. 
This total expected loss thus refers to non- sequential sampling and will 
be of particular interest in this study. 

Thc_d£S_ign 

A situation is fully characterized by the parameters Pq, p^, P(Hq), Cq and 
Cj and if an experimenter uses the above model there is often interest to 
include some of these parameter; - nr independent variables. The most 
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frequent situations used in Bayesian experiments with the binomial model 
have Pq + p j = 1 , P(Hq) = 0. 5 and Cq = c j. Less often the experimenter 
also varies the prior probability or lets c q ^ c i* anc * somc rare eases 
still greater variation of the parameters is constructed. 

To get an idea of the model I have analysed 525 parameter combina- 
tions on a computer. The combinations analysed have the following values: 
P 0 + P t = 0. 6 (0. 2) 1. 4, d = Pl - p Q = 0. 1 (0. 1) 0. 3, P(H Q ) = 0. 2 (0. 1) 0. 8 
and (c Ql Cj) = (150, 600), (200, 400), (30u, 300), (400, 200) and (600, 150). 
Among other things, the computer calculated kg, n Q (the optimal number 
of observations) and R q for every situation. Somc of the results arc pre- 
sented in Larsson (1970). As there arc greater differences in n Q and R q 
between d values for constant Pq + p^ than vice versa, Pq + p^ = 1. 0 was 
chosen because of greater simplicity, thus eliminating 420 situations. All 
three d values were included in the experiment though I was doubtful about 
d = 0. 1 as most of the R curves (as functions of n) arc here very flat around 
R q , which implies poor discrimination in R even for rather great variation 
in n. But 105 situations were too many for an experiment and in the first 
place I skipped all combinations with P(Hq) = 0. 2 and 0. 8 and (cq, c^) = 

= (150, 600) and (600, 150) because these situations were considered 
extreme, generating too many situations with n Q = 0. As I intended to 
repeat the experiment there were still too many situations left so, finally, 

I also took away P(Hq) = 0. 4 and 0. 6. 

This leaves you with an experiment where three independent variables 
(d, P(Hq) and Cq/c^), which have three levels each, arc fully crossed. 

All independent variables are within- subjects variables so that every 
subject has the possibility of being compared with the 27 situations. I 
think that this possibility often generates greater variation in behaviour 
than the case with bctwccn-subjccts variables but this is not the cause for 
the special choice, here. The main cause is rather that within -subjects 
variables give easier comparisons with the group tests where every item 
is naturally a within- subjects variable. Thus, speaking in the languange 
of ANOVA, the design of the experiment is 3x3x3 factorial, all factors 
being fixed and with repeated measurement. The experiment is given three 
times to every subject resulting in 81 trials per subject. To avoid order 
effect the situations are presented in different random sequences to every 
subject. Appendix 1 shows n Q , kg and R q for the 27 different situations. 



A Hq or Hj was then chosen in accordance with P(Hq) for the 81 trials 
and for every p value the appropriate number of binomial sequences with 
n =■ 1 (1)200 was generated with the aid of a computer. The computer 
was also used to prepare an extensive table for R values with all combi- 
nations of n and critical k values in the range n = 1(1)00 and k = 0(1 )n. 
This table v/ill be used to determine values of certain dependent variables 
described later. (A small number of combinations with n> 80 also needed 
to be calculated when it was shown that some subjects made more than 
80 observations. ) 

Thc_cl£Pjcnd£nt vajrtaMcs_ 

As in many other kinds of research, the dependent variable in Bayesian 
experiments can be classified as a choice variable or as a consequence 
variable. For instance, when a person answers a multiple-choice item, 
the particular alternative chosen constitutes a choice variable, while 
the evaluation of the item as a correct or wrong answer defines a con- 
sequence variable. Although comparisons between a subject's behaviour 
with the behaviour of a model is far from unusual in Bayesian experi- 
ments, choice variables arc nevertheless the commonest kind of depen- 
dent variables. We have c. g. the number of observations n, the pos- 
terior probability P(HQjk,n) and the likelihood ratio P(!;|Hq, n)/ P(k|Hp 
n). Concerning the consequence variables used, one may mention the 
accuracy ratio and different kinds of scoring rules for probability assess- 
ments: see Slovic h Lichtenstein (1971) and Stacl von Holstein (1970), 
respectively. A consequence variable often refers to a model: it is a 
function of two results of a choice variable, the subject's result and the 
result according to the model. (This is not necessary, the consequence 
variable can be used to compare two subjects, a subject with a group, 
etc. ) 

This study will concentrate on consequence variables but it also con- 
tains two choice variables. These are k and n. The subject decides to 

c 

make n observations and selects a critical value k such that he chooses 

c 

Hq if k v.k and otherwise. According to the statistical model, the 
corresponding optimal choices will be denoted kg and n Q . The conse- 
quence variables have to do with losses. However, the actual loss ii. a 
situation, which is n, Cq+ n or Cj + n, will not be used. We will instead 
use the expected loss R which is a function of k and n only (for a given 
situation). The expectation is over samples: for fixed and n,R is the 
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arithmetic mean of the actual loss when sampling is repeated an infinite 
number of times. Coupled with R vve define the efficiency E = R q /R, which 
is examined- more closely in Lars son (1970). Due to the definition of R q 
the range of E is 0 $ E « 1. Along with R and E v/c shall also define R b 
and Ef,. Rg (Eg) is R ( E ) corrected for deviation of k c from kg, that is 
Rg ( E g) differs from R q (1) only to the extent which n is nonoptimal. 

Summing up, we have R = R(k c ,n>, R B = R(kg,u), R q = Rfkg.n ),- E = 

= R q /R an <* Eg = R Q /Rg. For a certain situation, kg is a known linear 

function of n, but k is not in general a known function of n, which means 

that a construction of R (k , n ) cannot be done in the same way as for R_. 

c o B 

However, R and R B (E and Eg) will be sufficient, I hope, to give an idea 
about the partial effects of non-optimal and n. All four consequence 
variables (R, R Q , E and Eg) will initially be analysed, but only E and Eg 
will be used throughout as a result of this analysis. 

T h. e J?£rf or man£c 

The experiment was carried through by six persons working at the Depart- 
ment of Educational and Psychological Research, School of Education in 
Malm6. Every experimenter provided ten subjects. The choice was re- 
stricted to subjects who were studying, or had studied on a univeristy 
level, were not married to the experimenter and had no difficulties in 
understanding Swedish. The distribution of the sixty subjects as to faculty, 
sex and age is shown in appendix 2. (The categories arc those used later: 
"Humanities" include one divinity student and two medical students while 
"Natural sciences" includes four students of technology. ) The subjects 
cannot be regarded as a random sample from a population containing 
academic persons, such as students in Sweden, nor was it intended to be. 
Discussion of sample, population and so-called significance tests will be 
taken up later in connection with the presentation of the results. 

After an introduction of the experimental conditions and training of the 
experimenters, the experiment was performed during three weeks. The 
experiment was run individually and lasted about 150 minutes per subject. 
The experimenter introduced the experiment to the subject with the help 
of written instructions and five training trials. The unit of the c-scalc, 
which equals the cost of one observation, was fixed to 0. 1 Swedish crowns. 
The hypotheses were visualized as two bags, A and B, containing an 
enormous number of cards, which were either marked with 0 or 1. The 
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proportions of cards marked with 1 for the two bags were given in 
writing for each problem. The subject was told that the experimenter 
had randomly chosen one bag out of many bags, where thfs proportion of 
A bags \va> a certain .number, given in writing to the subject for each 
problem, Then the losses and the observation cost were explained to 
the subject and they were also given in writing. It was pointed out that 
many observations made a loss improbable but gave a great observation 
cost, while few observations gave hardly any observation cost but made 
a lose quite probable: the subject should consider a balance between 
these two factors when making observations. The possible outcomes of 
a certain number of observations was explained. It was said that a great 
number of cards marked with 1 indicated that the experimenter had 
chosen a B bag, available to the subject for sampling. On the other hand, 
a small number of such cards pointed to an A bag. The subject had to 
decide for a cut-off^point: which was the largest number of cards, 
marked with i, for which he preferred to guess on A? It was also said 
that, if tiO'thought so, he could make zero observations and just choa r O 
a hypothesis. Y/hen he had chosen n and lc the experimenter told him 
the outcome k from the simulated binomial sequences. He then wrote 
down the hypothesis that he chose (as a confirmation) and an estimate of 
the posterior probability (not used in this report). As we have no interest 
in learning in this study, no feedback was given to the subject whether iJm 
had chosen the correct hypothesis or not. The subject was not paid per 
hour but had a fixed amount of money from which he had to pay his losses. 
The subject was told that he could keep the amount left when the experi- 
ment was over, and he was also informed what this amount could be at 
most. This was done to motivate him, but the truth is that the amount 
left was transformed so that he got something between zero and eighty 
Swedish crowns, depending on how well he succeeded in relation to 
other subjects. (The arithmetic mean of this amount corresponded to 
ten crowns per hour. ) 

The group testing 

The group testing was held within a month after the subject had taken the 
experiment. It lasted about five hours and comprised eleven tests. One 
of the tests presents the 27 situations of the experiment in modified form. 
The modifications arc the following: the situations are given in the same 
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sequence to all subjects, the outcome k is unknown to the subject, he is 
paid per hour (and does not pay any loss), and the instruction and the 
test form arc therefore somewhat changed. This test will in the following 
be called the decision test and has the same dependent variables as those 
described for the experiment. 

The other ten tests arc proposed to measure some aspects of intelli- 
gence. They are selected from a larger pool of tests given to students 
doing their last term in the "gymnasium". (Students passing this school 
form qualify themselves for university studies at ap age which is usually 
19. ) The results of this testing is reported in Holmquist (1967). I selected 
tests which seem to have a tolerable reliability, which do not show any 
bottom or ceiling effect, and measure several aspects of intelligence. 
From factor analyses reported in the above paper the selected tests seem 
to measure (for these students) verbal understanding, verbal fluency, 
inductive reasoning, spatial ability .and perception, two tests for each 
factor. The intelligence tests arc listed in appendix 3 and will be more 
closely described when results arc discussed. 

Of the 60 subjects in the experiments only 56 completed the group 
testing. Throe persons were ill and one person left the testing when the 
last test was given. 
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RESULTS 

After some comments on the statistical treatment of data, there is 
first a discussion of the choice between different dependent variables. 
The results of the experiment which arc the main points of this report, 
arc then presented in four parts; the divisions arc group contra indivi- 
dual results and consequence contra choice variables. The results of 
the group testing are partly used for a comparison between the decision 
test and the experiment and partly for a comparison between decision 
results and the results of the intelligence tests. The section concerning 
the group testing also comprises discussions of reliability. 

Data processing 

The statistical treatment of data is based on linear models. Univariate 
as well as multivariate analysis is used. The attack is wholly descrip- 
tive, even if I use methods v/hich by tradition involve inference. This 
means that the reader cannot find one single probability referring to a 
significant result in the text. There are several reasons for this. The 
most important one is that it is very difficult to describe a population 
of persons to which my sample of subjects can refer. The sixty subjects 
cannot be. .‘regarded as a random sample. Although it is not uncommon 
in the behavioural sciences to make statistical inferences based on non- 
-random samples I prefer not to do so. However, I will not deny that 
tne results of such samples still contain some possibilities of making 
generalizations. Such things can also be found in this report, at least 
as hypotheses, but I find it meaningless to present n cxact n significance 
levels. The generalizations arc, by the way, not confined to samples of 
subjects. We may also have samples of situations and samples of ac- 
tions, but statistical theory is poorly equipped for this kind of inference. 

The second reason concerns the assumption of (multivariate) normal 
distribution. A good many of the distributions of this study cannot be 
regarded as normally distributed, some are very different from this 
bell-shaped lf idcal M . The talk of robustness, v/hich Bradley (1968, 2. 3) 
has named the myth of robustness, is hardly applicable here, due to 
tne severe deviations from normality. (Also, statisticians have very 
diverse opinions on this matter. ) Non-paramctric statistics has not 
attracted me, because I miss either suitable tests or suitable programs 
for my purposes. 
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The third reason has to do with the statistical treatment of separate 
individuals, where there will be trouble with the assumption of indepen- 
dent observations. Although the situations arc randomised for each 
subject it is not easy to decide whether observations arc independent 
or not between repetitions, which they should be if you want to use 
ANOVA for infer ential purposes. (An interesting question here would 
also be the problem of generalization: to what behaviour population 
could you infer from observations of a single person?) As a fourth 
reason I can add that a significant result has in itself little importance 
concerning ANOVA for the total group of subjects, because even a very 
small effect produces a significant F ratio due to the large number of 
observations. 

The elements of the descriptive data presented are arithmetic means, 
standard deviations and product-moment correlations. Group results 
and individual results of the experiment arc mainly based on ANOVA, the 
design of which has also been used when discussing standard devia- 
tions and correlations. ANOVA of the group results is based on a 
3x3 x3 x 3 x N factorial design for the total group and subgroups accor- 
ding to sex, age and education. ANOVA of the individual results is 
based on a 3 x 3 x 3 x 3 factorial design, with one ANOV A for every 

subject. The basic characteri sties of the results here arc relevant means 
✓ 2 

and Hays w , which is explained later. The discussion of the individual 
ANOVA results has also been supported by a method which identifies 
outliers. ANOVA has not been used for k c> because this quality is de- 
pendent on the choice of n and is non-numerical when n is zero. I have 
instead analysed it concerning linear relation to n, both for each situa- 
tion and for each subject. 

No ANOVA has been performed for the decision test but the design is 
used in a subjective v/ay when comparing it with the experiment. This 
section comprises the consequence variables only. Besides discussions 
based on single means, standard deviations and correlations some in- 
formation comes from canonical correlation analysis and factor analysis. 
However, neither of them is very convenient: the canonical analysis 
contains too many variables in relation to the number of subjects and 
one cannot restrict the weight vectors by suitable hypotheses, the kind 
of factor analysis available does not give a direct comparison between 
the decision test and the experiment. These analyses are more convenient 
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for the comparison between the intelligence tests and the decision 
results, for which they constitute the main methods. A rather large 
part of the section concerning the group testing is devoted to discussion 
of reliability, both for single situations and for sum scores. 

It must be underlined that this study contains certain information 
losses, which docs not become more excusable because most studies in 
the behavioural sciences also suffer from the same "illness". It is 
understood in most applications of the usual product-moment correla- 
tion that if two variables arc related then they are linearly related. If 
not, this correlation can be regarded as a lower bound of the total rela- 
tion. The product moment correlation is used in this study to discuss 
certain (minor) results and is a base for reliability discussions, canoni- 
cal correlation analysis and factor analysis. V/hat a substantial non- 
linearity can imply for the result of these analyses is not easy to say. 
There arc methods for checking nonlinearity and my only defense for 
not having used them is the great amount of extra work they would have 
involved. However, the main result of this study is free from the above 
accusations as ANOVA also handles nonlinearity. That nonlinearity is 
not without importance can be seen from the following example, which 
refers to the statistical model for the dependent variable n. Here ANOVA 
shows that the seven effects can predict n perfectly. But only using the 
three independent variables in an ordinary linear multiple regression 
analysis must have given (I have not done it) a meager result, since all 
three variables arc nonlinearly related to n. 

The choice between R, Rg , E and Eg 

If no result will guide the choice, I will prefer E and Eg to R and Rg , 
because the former variables have absolute scales and involve compari- 
sons with optimal behaviour. The case can also arise that only one of 
the variables will be chosen. The choice will first of all be based on 
correlations between the variables, second on reliability and distribu- 
tions. The statistics are calculated from the whole group of subjects 
and, as a rule, for every situation, which can mean 108 distributions 
as we have 27 different situations replicated three times in the experi- 
ment and given once as a group test. 

The linear correlation between R and E has -0. 998 c r - 0. 761 with a 

mean of -0. 923 ^nd the correlation between R_ and has -0. 999< r 

io x5 " 

-0. 896 with a mean of -0. 969. The latter correlations arc, with two 
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exceptions out of 108, not smaller than the corresponding correlations 
between R and E (in absolute value). Owing to the high or extremely high 
correlations one can choose either R or E and either R^ or E^. The 
correlation between R and R^ has -0. 236 ? r £ 0. 965 with a mean of 
0. 519 and the correlation between E and E D varies so that -0. 160$. r < 

0. 954 with mean 0. 544. From this it is clear that R (E) and Rg (Eg) 
cannot be regarded as similar: both have to be used. For situations 
with low correlations the correction for dewiation from the Bayes 
strategy has far from the same effect on all subjects. However, it is 
not obvious to me whether to choose R and Rg or E and Eg as the corre- 
lation structure is so similar for the two pairs of dependent variables. 

The four sets of the 27 different situations may be regarded as a 

test with 27 items given four times. The square of a multiple correla- 
2 

tion, R , has been calculated for every item in every set, where the 
item is regarded as a dependent variable and the other 26 items as in- 
dependent variables. These correlations can be seen as crude estimates 
of the item reliabilities (according to classical reliability theory). We 

have 0, 442 R^ .t. 0. 943 with a mean of 0. 777 for R and 0. 472 <. 0. 924 

- 2 
with a mean of 0. 724 for E. We have further 0. 479 s ? R "* 0. 983 with a 

7 

mean of 0. 861 for R^, while Eg has 0. 482 < R 0. 980 with mean 0. 849. 

Likewise, the reliabilities of the sums of 27 items do not differ between 
R and E (between R B and Eg), but do differ between R and Rg (between 
E and Eg) as above. Thus, reliability will hardly give any cues whether 
to choose R and Rg or E and Eg. More will be said about reliability 
later in another connection. 

The distributions of R and Rg are almost all positively skewed, 
while the distributions of E and Eg arc positively as well as negatively 
skewed. If we c. g. define bimodality as a frequency of at least 10 for a. 
class which lies at least three classes away from one or more classes 
with frequencies of at least 10, R has 2 such eases, E 8 cases, Rg 
3 cases and Eg 16 eases. Relative to the standard deviation the class 
width is somewhat greater for R and Rg than for E and Eg but hardly 
enough to produce the above differences in the number of bimodalitics. 

If so, E and Eg seem to involve more eases where the subjects are 
better separated in two groups. 
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The arithmetic mean m has the following ranges and means: R: 613 ^ 
m ^ 1396 with mean 970, R^: 490?:- m < 1314 with mean 790. E: 0. 513 < 
m^- 0. 91 1 with mean 0. 724 and Eg: 0. 718 0. 971 with mean 0. 849. 

We have, of course, m (R)^?m (Rg) or m (E) < ra(Eg) for every 
situation. For the standard deviation s, R has 131 < s <,764 with mean 
351 , Rg has 49< s <314 with mean 163, E has 0. 085 < s < 0. 294 with 
mean 0. 180 and Eg has 0. 048^ s sO, 236 with mean 0. 136. Here the 
correction for deviations from the Bayes strategy always gives a reduc- 
tion of s (R) with 1. 0< s (R)/s (Rg)< 11. 1, but not so for s (E) where 
0. 8 < s (E)/s (Eg) 0.8 with 14 ratios less than 1.0. If anything, this 
is an advantage for (E, Eg) over (R, Rg) because reduction of s can be 
assumed to generate fewer differences between subjects. 

Summing up, the analysis of the consequence variables has tried to 
answer two questions. Firstly, do we need all four variables? According 
to the correlations the answer is no: we need either R and R A or E and 
Eg. Secondly, arc there any results which point to (R, Rg) or (E, Eg)? 
There arc scarcely such results in the analysis undertaken. We can 
possibly take the fact that we have eases with s (Eg) >s (E). However, 
the ansv/cr is in principle !, no n and for this reason T choose (E, Eg), 
as mentioned first in this part. Thus, the dependent variables used 
later in this report will be k^ n, E and Eg. 

The experiment 

The treatment of the data builds heavily on the factorial design. Each 
of the dependent variables E, Eg and n has its own ANOVA, partly for 
the group of subjects and partly for every individual subject. We have 
added an ANOVA on n for the results emanating from the statistical 
model, but net so for E and Eg as all effects will here be trivially zero. 
The above variables have also been used when a multivariate procedure 
for identification of outliers is performed. The fourth dependent 
variable, 1< C , is analysed for linear relations with n, both for each of 
the 27 situations and for every subject (and the statistical model). 

Group results on E and Eg 

We have primarily analysed the group results with the help of ANOVA as 
outlined by the experiment. This has beer, made for the total group and 
its division according to sex, age and education. Significance tests have 
been avoided and, instead, descriptive statistics of the different effects 
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2 

in the form of w are presented. This index shows the proportion of 

the total sum of squares which the sum of squares of an effect consti- 
2 

tutes, that is u effect = ®®cffcct^^tot n l* ^° r a c ^ oscr presentation 
see c.g. Hays- Winkler (1971, pp. 728-730). The ANOVA gives 31 
effects arising from five factors. These arc D (different d-valucs), 

P (different prior probabilities), C (different cost ratios Cq/c^), T 
(different replications) and S ( different subjects). Only D, P and C are 
regarded as proper independent variables of the experiment. Effects 
containing T but not S inform us about the stability of the group of sub- 
jects over replications. Effects containing S but not T inform us about 

individual differences on several averages. Effects containing both T 

2 

and S will not be discussed. Likewise, tu 0. 05 is considered neg- 

2 

ligiblc and I think w should be at least 0. 10 to be of any interest. Of 
course, this is a wholly subjective statement, but one lias to determine 
a lower boundary and in an exploratory study this boundary could be set 
rather high. 

For the total group the ANOVA of E shows only one substantial effect 

among the proper independent variables. This is the main effect D for 

2 

which w = 0.181, For d = 0. 1, 0. 2 and 0. 3 we have the means 0. 812, 
0. 737 and 0. 591, respectively. This result is attributed to different 
degrees of robustness for different d values. R (k f n) is in general 
steeper around R q when d = 0. 3 than when d = 0. 1 for both dimensions 
k and n, which often generates lesser efficiency for d = 0. 3 than for 
d = 0. 1 , given the same values of k c - kg and n - n Q . This result is 
analogous to those of many probability revision experiments, where it 
is said that greater diagnosticity (d-values) produces greater conserva- 
tism (difference between, or other functions of, probability according 

to Bayes's theorem and estimated probability). Only one further effect 

2 

is substantial, that of the main effect of 5 where w = 0. 180. We have 
0. 428 893 with mean 0. 714, which I think is quite a gcod varia- 

tion for an absolute scale. Values of w ^ just above 0. 05 arc found for 
the interactions SD, SP and SPC. 

Compared with E, the ANOVA of for the total group exhibits 
2 ^ 

raised w w values for S and SB and a lower value for D, other things 

2 

being essentially the same as for E. For D we get w = 0. 109 arising 
from the means 0. 904, 0. 865 and 0. 778. Comparing these values with 
the corresponding ones for E, we find that the correction is most bene- 
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ficial for d = 0. 3. Further, the values of E„ - E and 1 - E_ arc about 

B B 

the same for every d, meaning that the inefficiency is equally caused by 

2 

nonoptimal choice of n and k . For S v/e now have w = 0. 270 with 0. 644 

C 2 

<L m 0. 966 and mean 0. 849. For SD w = 0. 162 which can be illu- 

strated by three D profiles which arc most different among themselves: 

(0.947, 0.929, 0.929), (0.585, 0.754, 0. 591) and (0. 945 , 0.699, 0.449). 

Thus, relative to the total sum of squares we have a better differentiation 

of the subjects for than for E. (SS, . . for E_, is about one half of 

B total B 

SStotal ^ or ) ^ T ° ct ^ lcr effects arc over 0. 05 and, especially, effects 
containing T'but -not S arc far below the 0. 01 level. This is also true for 
E so that the group docs not cho.ngc in behaviour from replication to re- 
plication. Or more exact, their behaviour is such that the consequence 
of the behaviour is the same from replication to replication. 

We may construct an average subject through calculating means over 
the sixty individual subjects. The ANQVA of this average subject can be 

deduced from the ANOVA of the total group if all effects containing S arc 

2 

ignored. Doing this, v/e get values of a) which are small for all but two 

2 2 
effects. For E and E^ v/e have w ^ equal to 0. 757 and 0. 671 and 

equal to 0. 129 and 0. 196, respectively. Thus the efficiency of the average 

subject is very dependable on different d values. 

No computer program was available which could incorporate sex, age 

and/or education as extra factors in the above design, because the number 

of cells became too large. I have therefore made ANOVA as before, 

one for men, one for women, etc. , which is a little unsatisfactory as 

c. g. all effects involving sex cannot be directly evaluated. Anyhow, it 

seems to rnc that the new ANOVAs tell approximately the same story 

as did the ANOVAs of the total group. Thus, I will comment briefly upon 

the results. 

2 2 

Concerning sex, men have w = 0. 136 and u> c = 0. 206 and women 

2 2 ^ ° 
have w q = 0. 244 and = 0, 149 for E, while for Eg we have 0. 054, 

0. 270, 0. 192 and 0.235, respectively. For E_ we further have = 

B laU 

0. 190 for men and = 0. 115 for women. Other effects have u> c not 

SD 

greater than 0. 066 and often arc much smaller. Relative to their own sex, 
men arc less affected by different d values than women are and arc more 
differentiated in their means. However, the mean efficiency is about the 
same for both sexes, being 0. 725 for men and 0. 701 for women, concer- 
ning E, and 0. G73 and 0. 320, respectively, for E^. For both sexes E^ - E 



B 



is greatest for d = 0. 3 and at least here v/e have a pronounced difference: 
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while men have the same value - E and 1 - E^ (a small positive 

difference), women's inefficiency is more related to the choice of n than 

to the choice of k c> given n: E^ - E - (1 - E^,) is -0. 120. 

The total group is divided into three age groups, i. c. Al: at least 30 

years old, A2: between 25 - 29 (inclusive) and A3: at most 24- years 

old. For E one finds oj = 0. 204, 0. 206 and 0. i66 for Al , A2 and A3, 

U 2 

respectively, and the corresponding values for oj „ arc 0. 147, 0. 163 and 

2 5 

0. 184. Three other effects have 0. 080 - u 0. 090 for Al, but we have 

2 

as a whole no difference between the age groups. The case of Ep has u ^ 

= 0. 152, 0. 082 and 0. 129, w* = 0 . 287, 0. 282 and 0. 233 and u = 0. 166, 

o 2 oJJ 2 

0. 127 and 0. 18C. Again v/c have the same picture: goes down and oj g 

and 03 gQ rise, when E is replaced by E^, although in somewhat different 
degrees for Al, A2 and A3. No total mean differences between the groups 
arc discovered; E gives 0. 662, 0. 700 and 0. 735 while gives 0. 811 , 

0. 844 and 0. 861 , but the trend is that the younger subjects arc a little 
more efficient. Ac for the sexes , E^ - E and i - grows with increasing 
d values and Eg - E ic in most cases slightly smaller than 1 - E^. There 
arc two exceptions: for d = 0. 2 the A2 group is much more affected by 
the choice of k c than by the choice of n, E^ - E - (1 - E^) being 0. 103. 

For d = 0. 3 and Al the corresponding value is -0. 095. 

The total group has also been classified as to type of academic study 
with special regard to mathematics and statistics. The three groups arc 
El: humanities, E2: social sciences and E3: natural sciences, the distri- 
bution of which was given in appendix 2. One may assume that good know- 
ledge of mathematics and statistics will produce better efficiency than little 
such knowledge. This hypothesis has been examined before, see c. g. 

Kogan &: Walla. eh (1964). Although there arc overlaps, it is reasonable to 

suppose that Ei has the least mean knowledge, E3 the greatest mean, 

2 

while E2 will take a middle position. For E, w n = 0. 244, 0. 186 and 

” 2 

0. 110 for El, E2 and E3, respectively. We have further w g = 0. 142, 

0. 187 and 0. 119. For E„ , » ^ s 0. 218, 0. 084 and 0. 074, ^ = 0. 190, 

0. 308 and 0. 107 and 03 n = 0. 150, 0. 150 and 0. 177. With the exception 

oJJ ^ 

of E3 for factor S the same picture reappears: 03 ~ becomes smaller and 
2 2 

a) g and 03 becomes greater. However, there arc greater numerical 
differences here than for the other classifications. For instance, El is 
much more affected by different d values than E3 is and E2 has more 
differentiated individual means than E3 has. Other effects arc small, 
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although E3 has some minor ones, c. g. u 2 _ 0 o62 and Q 090 fop 

, DJr C 

respective E and Eg. The total means arc for E 0. 713, 0. 695 and 0. 773 

and 0. o24, C. u'.-6 anci 0. 092. for Eg. Time, the hypothesis about mi owl edge of 

mathematics and statistics is in line with the above means, but the 

differences in these seem to be too small for a real confirmation of 

the hypothesis. Again we have increasing values of E,., - E end 1 - E 

for increasing d values for all three groups. For d = 0. 2 and 0. 3 El has 

1 - E fi >• Eg - E, while for the other groups the choice of lc and n 

produce about the same inefficiency. Notice that there is a certain 

correspondence between sex and education: the eleven students of 

natural sciences consist ol ten men, while the sixteen students of 

humanities have only four men. In fact El- and women have many similar 

results on ANOVA and E3 and men have some corresponding results. 

Group results on 1; and n 
— £ 

The main results come from ANOVA on n and the linear relation between 
k c and n, both analyses for the total group only. The ANOVA shows only 
one substantial effect, that of S which has an <*? of 0. 537 and this re- 
fers to moans between 0. 0 and 91. 0 with a total mean of 21. 1. No 
other effects give u, 2 greater than 0. 05. The sum of a, 2 for the proper 
independent variables D, P and C is 0. 022 and the corresponding sum 
for effects containing T but not 5 is 0. 003. We can certainly say that, 
relative to the variations between the subjects, the choice of n is con- 
stant over replications and scarcely dependent on the different situation 
parameters. 

Looking at the average subject, whose ANOVA contains the above 
sums of squares, which do not contain S, we find three w 2 of some size. 
These arc a. 2 =0.134, a, 2 = 0. 458 and a. 2 c = 0. 103. For D the means 
arc 23. 2, 20. 6 and 19. 5 and for P we have 19. 1 , 25. 1 and 19. 0, where 
the first mean of each effect corresponds to the lowest level, and so on. 

Concerning PC we have (21.9, 17.8, 17.6), (25.4. 24.7, 25. 2 ) and 
(17. 0, 16. 9, 23. 0) for the simple C effects of P(H q ) = 0. 3, 0. 5 and 0. 7, 
respectively. Comparing with the statistical model, the total means arc 
almost the same: 20. 6 due to the model and 21. 1 for the average subject. 

The model produces three u 2 above 0. 100. i. c. to 2 = 0 676 2 

0. 142 and w = 0. 155. Also, the sum of oi 2 of effects containing T 

is zero while the average subject gives a sum of 0. 123. 
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The greatest difference between the model and the average subject 

comes from the choice of n for d = 0. 1 , where the model has a mean 

of 7. 4, This arises from the fact that for the asymmetric situations, 

when d = 0. 1, the impact of the observations is so slow that it is optimal 

to choose a hypothesis without paying any observations: R (k^ , n) > R 

(iCq » 0). That subjects disagree with the model in this way for similar 

situations has been verified before, see c.g. Larsson (1968) and 

Snapper & Peterson (1971). However, as most functions Rare flat 

around R q for d = 0. 1 the inefficiency of this disagreement is in general 

insignificant. As we shall see later, the above ,: wrong" choices of the 

average subject arc not valid for all individual subjects. 

The model and the average subject both behave in the same way for 

different prior probabilities, although the average subject lias a greater 

2 2 

variance of the means. (Notice that for an effect i we have <» T*(a)/ wf^b) = 
FsSj(a)/SS.(b)! |SS t (b)/SS t (a) ■ , where SS t is the total sum of squares 
and a and b denote two persons, etc. If we let a stand for the average 
subject and b for the model we have the following relation for the main 
effect P: 0.458/0.014 = 660/150 * 10885/1441 . Thus the great ratio 

2 f 

between the u is dependent cn a greater variance of the P means of 

the average subject and his lesser total variance.) 

The PC effects show about the same patterns; the exception is C for 

P(H Q ) = 0. 5, where th c average subject produces a horisontal profile 

and the model a triangular one. The model, however, has a greater 

variation than the average subject. Finally, the DPC effect of the model 

is ordinal, meaning that the different simple interactions of PC show 

the same pattern but with different spreads c.g. the three profiles of 

P (Hq) = 0. 3 arc all non-increasing for increasing C levels. 

As the Bayes strategy implies that k-. is a linear function of n it 

Jt> o 

has been natural to me to analyse k as a linear function of n: to what 

c 

extent and how can we express k as A + Bn? For the model all 27 situa- 

c 

tions give B = 0. 5 with -3. 34 - A < 3. 84 for d = 0. 1 , -1. 90^. A £. 1. 90 for 
d = 0. 2 and - 1. 24 £ A 4 1. 24 for d = 0. 3. Due to symmetry we have the 
same A value but with reversed sign when a situation with parameters 
(d, Cq/cj, F (H q )) is replaced by a situation with parameters (d, Cj/cq.. 

1 - p(h 0 )). 

The linear relation between k c and n has been analysed for each of the 
27 situations with at most 180 eases (60 persons times 3 repetitions) for 
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every situation. Cases with n equal to zero have been deleted, as k has 

c 

no numerical value for these cases. This means that the number of cases 
varies between 114 and 169. The linear relation is clear: the correlations 
between k c and n is such that 0. 787< r ^0. 995 with a mean of 0. 950. For 
the A and B values we have -0. 566 tr A < 1. 840 with a mean of 0. 587 and 
0. 233^ B£ 0. 488 with a mean of 0. 402. It is thus obvious that persons 
tend to underestimate kg, at least when n is great. In fact, there arc 
situations where persons underestimate kg for all n>0, This behaviour 
is an important factor when explaining inefficiency: nonoptimal choice 
of great n values is combined with bad choice of k . Considered as a 
group, these persons have a clccir bias against the hypothesis with the 
smaller p value, at least for groat n. Why this is so is difficult to under- 
stand. A tentative explanation is that most subjects overestimate the in- 
formation of a n l n in relation to the information of a ,, 0 n . Another con- 
cerns the instruction given to the subjects: "I choose Hq if the number of 

ones is less than or equal to . " Perhaps \vc had got the opposite 

bias if the instruction had been ,f I choose if the number of ones is 

greater than . 11 The bias is related to the factors D and P, such that 

the bias is greater for greater d values and smaller for greater P(Hq) 
values. (We have B equal to 0. 440, 0. 407 and 0. 360 for D levels and 
0. 350, 0. 412 and 0. 444 for P levels. ) The relation to P is quite "rcaso- 
nablc", and similar to the statistical model, but the relation to D is 
harder to suggest explanations for. Anyhow, this relation also generates 

inefficiency because of the lesser robustness to deviations from k- for 

J3 

d = 0. 3 than for d = 0. 1. 

Let us again look at an average subject. You may imagine him in the 

following way: Every subject chooses a n value and a k^ value for every 

situation and repetition, and the average over subjects and repetitions 

constitutes the choice of the average subject for a certain situation. This 

means an average n which is calculated from 180 cases. As some of the 

k values arc non-numcrical the average k value is calculated from 

A + Bn, where A and B is the average estimated parameters discussed 

above and n is here the average n value just mentioned. (Strictly speaking, 

j. n is the nearest integer to this average n and k^ is then the greatest 

r- integer less than or equal to A + Bn. ) For most situations both E and E_ 

v- B 

£ arc greater for this average subject than for the average E values of 

l the subjects. This was expected since most efficiency curves, as func- 

£ tions of n or k and n, arc concave. The differences are greater for 
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d = 0. 3 than for d = 0. 1 , especially for E^, because the concavity is 
more pronounced for greater d values. We have 0. 48 1 £ E< 0. 991 with 
a mean of 0, 780 and 0, 775 < E^i^i. 000 with a mean of 0, 954, The in- 
efficiency is almost always little dependent on the choice of n. Thus, 
this type of group decision will in general improve on the choice of the 
amount of information but not on how to use it. However, cxc options 
from this "rule" can be found for certain situations and there arc also 
certain subjects who arc more efficient than this average subject (or 
group decision). 

Individual results on E and E_ 

Jo 

For every subject there is an ANOVA with factors D, P, C and T 

(compare the group ANOVA), both for E and E n . Effects arc considered 

2 15 

nontrivial only if a) is greater than 0. 100. There is great variation 
between subjects, showing from zero to five substantial effects in their 
ANOVAs; about half of them show two effects. The commonest one is 
D, then comes DPC, just as for the average subject, constructed by 
collapse of the group ANOVA. While there are subjects with about the 
same pattern as this average subject, there arc also subjects with 
totally different "styles 11 , c. g. the one with no substantial effect. This 
docs not mean that he behaves like a statistician: the average efficiency 
can be far from 1. 000 and/or his variation, concerning efficiency and 
therefore his choice of k c and n, from repetition to repetition may be 
great. This is in fact the ease for the subject with zero effects. 

It is almost impossible to go into details of every ANOVA, I have 
instead selected some ways of description to highlight individual diffe- 
rences. One of these ways concerns the identification of outliers, which 
has been performed by a multivariate technique based on the Mahala- 
nobis distance. This has been done for every repetition, for E as well as 
for Eg. The method selects the subjects (if any) who arc "too far away" 
from the group centroid in the 27-dimcnsicnal space, which constitutes 
the space where subjects are represented as points for our ease. (See 
Dixon (1970), pp. 104-112. ) The selection, of course, results in a more 
homogeneous group, as concerns the remaining subjects. They arc also 
better: the subjects deleted arc in general inefficient and this is valid 
for E and E^. It is not always the same subjects who are selected in the 
six eases, and those who arc may come in different order from case to 
case. (Let rank order 1 denote that the subject is selected first, and so 
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on). I have looked at the ANOVA results for the subjects who have the 
five lowest average ranks, partly for E and partly for Eg. We have six 
subjects totally, four persons arc the same for both dependent variables. 
One of them has means which correspond to the group means, but the 
others are far below these levels. 

It has been stated earlier that for most situations we have s(Eg) ' 
s(E) and that SS total (E B ) is about 50 per cent of ss tota i( E ) for the K rou P 
ANOVA. The same reduction is, as a rule, also found for the indivi- 
dual ANOVAs. There are subjects whose SS total (Eg) is only 5 per cent 
of SS .(E), depending on ceiling effects: the correction for deviations 
from the Bayes strategy makes the efficiency values high. On the other 
hand there are subjects with no reduction and three of those arc among 
the above-mentioned outliers. One may expect that E B shows smaller 
variance than E, since one of the causes for inefficiency has been re- 
moved, and this is usually true. But if a subject almost always chooses 
k c = kg or if his k c choice is very varied there need not be any reduc- 
tion, on the contrary, there can be an increase of SS tQta j. The three 
outliers are of both types: one subject has 1 - Eg and Eg - E equal 
to 0. 333 and 0. 051 , while the others have (0. 356, 0. 183) and (0. 335, 

0. 237) and thus are inefficient when choosing n as well as k c> While 
1 - Eg and Eg - E are of the same magnitude for the total group (0. 151 
and 0. 133, respectively), we find great variations among the subjects. 
All four types are represented: good at both k c and n choice (example: 

0. 067 and 0. 050), good at k c and bad at n choice (example: 0.333 and 
0. 051), bad at k c and good at n choice (example: 0. 086 and 0. 279) and 
bad at both choices (example: 0. 335 and 0. 237). If we make median 
splits for 1 - Eg and Eg - E the cell frequencies of the fourfold table 
are 18, 12, 12 and 18 which means a smaller negative correlation than 
I had expected with regard to the construction of the variables. 

The six outliers are all but one worse than the average, as concerns 

the choice of the number of observations. All of them show about the 
2 

same u profile: they have fewer substantial effects and these are 

lower than average. We have, with results from all individual ANOVAs 

within parenthesis, for E the mean number of substantial effects equal 

to 1. 00 (2. 17) with 0. 102^ u> 2 5; 0. 210 (0. 100" 1 } ^ 0.755) and for E_ 

2 ® 

the corresponding results are 1. 17 (2.02) and 0. lOl-Tu -Sr 0.328 (0. 101 

u 0. 788). This implies that every outlier has small differences 
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between means and/or great variations over repetitions. The .first cause 
is more valid for E and the second one for Eg. (They have about average 
SStotai * or ^ but above the average for Eg. ) They tend to act 

like random number generators, when it concerns the choice of n: 
sometimes they hit the target and sometimes they are far from the opti- 
mal number. 

The variation between subjects is not the same from situation to 
situation. For factor D the greatest differentiation is obtained for 
d = 0. 3 with standard deviations. (0. 169, 0. 180, 0. 205) for E and (0. 109, 
0. 125, 0. 190) for Eg. This is a reasonable result, as the robustness 
of efficiency, as to choices of k c and n, is greater for d - 0. 1 than for 
d = 0. 3. Hence, a certain variation of choices causes greater varia- 
tion for the greater d value. No more systematic effect can be dis- 
covered for Eg, but for E there is another effect, which can be seen 
for P and C and which is very pronounced for PC. With increasing 
P(Hq) values we have, for increasing c q/ c j values, (0. 169, 0. 182, 

0. 233). (0. 170, 0. 155, 0. 175) and (0. 223, 0. 179, 0. 143). It is quite 
evident that asymmetrical situations produce greater differences bet- 
ween subjects than more symmetrical situations. As this is not the 
case for Eg, the fact must be caused by the choice of k c< The figures 
0. 233 and 0. 223 refer to the situations where Hq is both probable and 
cheap (when wrongly chosen) and where Hq is both improbable and 
expensive, respectively. There are obviously more different opinions 



as to how to choose k c when both determinants "go in the same direc- 
tion". Perhaps the smaller variations in the opposite situations are 
due to some general reasoning like "the two factors will balance each 

other so I should choose k near n/2"? 

c ' 

Individual results on k and n 
_ £ 

For every subject there is an ANOVA of n like those for E and E_. We 

JD 

also have an analysis of the linear relation between k c and n for every 
subject, including the statistical model. We have again used the method 
for detection of outliers, as concerns the choice of n. Let us again, 
select the subjects with the five lowest average ranks. It is interesting 
to notice that three of the outliers from E and Eg reappear (the three 
subjects which have no reduction of SS tQtal (Eg)). Some of the characte- 
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ristics of the outliers for n are the following, with corresponding 

results for all individual ANOVAs within parenthesis: they make many 

observations, 29. 9 < m(n) 91.0 (0.0^. m(n) ;C.9i.O), they have the 

2 . 

five greatest SS tQta j, they have an average number of u £ 0. 100 of 
1.4 (1.3), but these are small, 0. 106£ u) <1 Or 251 (0. 1 00?;. < 1.000) 

Thus, the outliers make too many observations, have no pronounced 
strategies for the choice of :i and lie below the 2Cth percentile on both 
E and E^; in fact we find the worst subject on each efficiency variable 
among these outliers. 

The individual ANOVAs for n have together about half as many 
2 

effects with « to q. 00 as the ANOVAs for E and Eg, and they are 
otherwise distributed. 2v?ost common effects are D, P and PC. There 
are no more subjects with distinct bi profiles here than for E and 

fewer than for E R . (7, 6 and 15. respectively, if we define a distinct 

" 2 
profile as one with either a sum of the substantial ( u ,5-0. 100) 

effects greater than 0. 750 or one which has a single effect greater 

than 0. 600. ) The strategies of information purchase as illustrated by 

the w profiles (or even the distinct ones) are quite different bet- 

2 

ween the subjects. One subject is most sensitive to D ( w = 0.525), 

2 

three others concentrate on P ( “ = 0.659, 0. 779 and 0. 794), another 

2 2 
one on C ( w =0, 901), while one is totally absorbed by FC (o = 1. 000) 

and the other subjects more or less have strategies which take into 

consideration more than one effect. There are ten subjects with no 

substantial effect at. all ana hence with no strategy, except a random 

one. (Another two subjects always make the same number of observa- 

2 

tions, which implies tha.t their vj “ values are not defined. ) I may also 
inform the reader that the statistical model produces a strategy, which 
concentrates on D ( to ^ - 0.676). 

The distributions of n are all positively skewed. Surprisingly many 
subjects have chosen n equal to aero (between 2 and 24 per trial), but 
at the same time there are almost always choices of at least 100 ob- 
servations (between 0 and 8 per trial). The standard deviations are 
therefore of the. same magnitudes as those of the means. We have 11.5 
s < 42. 6 with an average s of 25.8. According to the factorial 
design there is but one effect of s. For increasing D levels the means of 
s are 30. 9, 24.0 and 20. 7, which seems reasonable, for the following 
reason. If we could plot n as a function of d (average over P and C) for 
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every subject and many d values, we probably would obtain curves 
which were unimodal and with n equal to zero when d is zero and one. 
The position of the maximum n value and the average n of a curve are 
different from subject to subject. From this we will expect small s 
values for very low and high d values. In our case we can expect still 
smaller s values for d greater than 0. 3 than the s value for d equal to 
0. 3. If we had made d smaller than 0, 1 we could expect that more and 
more subjects would ultimately realize the futility of making any obser- 
vations. Although the curve generated from the statistical model has 
its maximum n value when d is about 0. 2, only a few subjects have the 
same type of a D profile. More than half of the subjects have profiles 
which vary less than five units of n. Another 15 subjects have profiles 
where n decreases for increasing d values. 

The linear relation between k c and n has been analysed for each of 
the subjects with at most 81 cases for a subject. When n is zero the 

case has to be deleted as k is non-numerical here. The number of 

c 

cases varies between 0 and 81, but only five subjects have less than 

45 cases. The linear relation is more or less evident from subject to 

subject: we have -0. 022 ^ r <- 0. 997 with a mean of 0. 820. There are 

23 subjects \uth r greater than 0. 900 and only ten subjects with r less 

2 

than 0. 700, and five of these values depend on s (n) being zero or very 

close to zero. For the A and B values (k c = A + Bn) we have -8. 094 

A 9. 500 and -0. 050 <. B <£. 0. 866 with means of -0. 039 and 0. 426, 

respectively. We have, on the average, the same results here as for 

the total group: the individual subjects are in general biased against 

the hypothesis with the smaller p value. However, the differences 

between subjects are great. We have a few subjects which are biased 

against the other hypothesis, some subjects are not biased, while some 

subjects are so biased against the hypothesis with the smaller p value 

that they always choose k smaller than n/2. The standard deviation of 

k c » given a particular n value, also varies greatly between subjects: 

0.2 -^ s < 11.3. where s stands for the standard deviation of k about 
* — • ’ c 

the regression line. Four of the above mentioned five outliers for n 
have the four greatest s values and they are all biased against the 
hypothesis with the smaller p value. Only one of them belongs to the 
subjects with the five greatest values of Eg - E. The latter subjects 



have rather great s values but three of them are not biased. Why this 
is so, cannot be settled by the analysis here. Perhaps these three 
subjects choose lc c far from kg for situations which are not robust 
for deviations from kg, but I do not know. It can be added that the 
statistical model gives an r value of 0.921 with a standard deviation 
about the regression line of 0. 9 and that A = -0. 500 and B = 0. 500. 

(A is different from zero, because kg is an integer and this produces 
a bias. ) 

The group testing 

This part deals with comparisons between the experiment and the 
decision group test and the intelligence tests. The presentations con- 
cern group results only and for the experiment as well as for the deci- 
sion test the dependent variables are limited to E and Eg. The compari- 
sons use means, standard deviations and correlations. The correla- 
tions are further analysed by the use of canonical correlation analysis 
and factor analysis. Discussions of reliability are also made. 

As has been stated before, all 60 subjects did not take the tests. 

The results of the decision test are based on 57 full records, while 
the results of the intelligence tests comprise only 56, since another 
subject had to be deleted. Looking at the results of the experiments, 
the greatest differences between those deleted and the total group arc 
found for E, as concerns the decision test. (Means of 0. 623 and 0. 714, 
respectively. ) If we suppose no change of results from the experiment 
to the decision test, the deletion will cause an increase of the total 
mean to 0. 719, which can be considered negligible. Still lesser 
effect may be expected for standard deviations. E. g. the standard 
deviation of the subjects^ means of E will, under the above assumption, 
not change more than 0. 001. On the whole I do not think that this five 
to seven per cent of non-response is anything to worry about: differ- 
ences of results between the decision test and the experiment is hardly 
due to differences between the 57 subjects and the 60 subjects. 

T^ jlom s_i on Jt e_s t_and_the_ exjjer iment 

Reliability 

We will begin with some viewpoints on reliability. Every situation can 
be regarded as an item and for each repetition of the experiment, as well 
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as for the decision test, there are possibilities for observing reliability 
of an item and of the sum score of the 27 items. This can be done in 
several ways, both with regard to the definition of reliability and the 
estimation of reliability. We have, in principle, three populations: 
those of subjects, items and actions. No generalizations will be made as 
to a population of subjects, since the subjects of this study cannot be 
regarded as a random sample. Nor will generalizations be made as 
to a population of items. The definition of this is in general very diffi- 
cult, but we have the unusual possibility of defining the population 
unequivocally according to factor D, P and C. However, the 27 situa- 
tions selected are hardly any random sample from such a population. 

As a consequence of this the above factors have been regarded as fixed 
for the ANOVAs. The only population left is difficult to discuss, because 
it is not obvious howto define a random sample. So, strictly speaking, 
there arc no generalizations for the reliability values, which is in 
accordance with what has already been stated about the study at large. 

On the other hand, I think it is reasonable to expect the same kind of 
rc suits f as have been found here, if you replicated the experiment, 
even if you chose some other levels of D, P and C and, perhaps, also 
with other, similar subjects. 

When we speak about reliability here, \vc refer to the classical 
model, sec c. g. Lord & Novick (1968, ch. 3). Let us start with the 
item reliabilities. Two measures arc used: one internal measure 
(within a set of 27 items) and one measure based on correlations 
between the sets. The internal index is the squared multiple correla- 
tion between an item and the other 26 items. If the number of sub- 
jects is very great in relation to the number of items, the squared 
multiple correlation R*" is a lower limit - and perhaps a bad one - of 
the reliability. However, when the number of items approaches the 
number of subjects, R' will approach 1. A common correction for this 
bias is based on the unbiased estimate of the residual variance, sec 
c. g. Darlington (1968). I believe that the corrected values better re- 
flect facts, because they seem less affected by the relation between the 
number of items and the number of subjects. (Notice that this correc- 
tion need not concern inference: the same bias is obtained whether we 
call our subjects a sample or a population. According to Dempster 
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(1969, p. 161) "theoretical understanding of this phenomenon of 
diminishing returns for variables introduced remains imperfect, 

The following squared multiple correlations are obtained (with un- 

corrcctcd values within parentheses), The decision test has 0. 145 

(0. 542) < R 2 £ 0. 763 (0. S73) with a mean of 0. 504 (0. 735) for E and 

0. 033 (0. 4S2)ic R 4: 0. S75 (0. 934) with a mean of 0. 633 (0. 804) for 

2 

Ejj. The experiment has 0. 055 (0. 472) R < 0, 862 (0. 924) with a 
mean of 0. 500 (0. 721) for E and for v/c have 0. 309 (0. 614)<I R^ 

0. 962 (0. 980) with a mean of 0. 756 (0. 864). Two results are obvious: 
the reliability of E^ is better than that of E and the reliability of the 
items of the decision test is equally pood as for the experiment when 
it concerns E but lower for E^ (on the average). In spite of the smaller 
standard deviations of E^ , the reliability is greater here, because 
there is only one unreliable determinant: the choice of n. However, 
this is not always so; in li eases out of 10S v/c have the reverse re- 
lation. The average item reliability must be regarded as good. 

The correlations between replications of the situations can be re- 
garded as (modified) retest correlations. For the experiment the modi- 
fication consists of the items being presented to the subjects in diffe- 
rent random orders. The decision test is so different from the experi- 
ment that I hesitate to call the correlations between this test and a 
repetition of the experiment retest correlations. Yet I give them - 
they may have interest as lower boundaries. The decision test has 
- 0. 085 ^ r 0. 566 with a mean of 0. 267 for E and -0. 051 <.*. r < 0. 606 
with a mean of 0. 306 for E^. The corresponding values of the experi- 
ment arc 0. 255 4- r ^ 0. 824 with a mean of 0. 509 and 0. 173^, r 

0. 902 with a mean of 0. 620. ‘The experiment shows the same average 

2 

(for E) here as the average of R , while the r mean of E^ is smaller 

2 B 
than the R mean. As was expected the correlations arc smaller for 

the decision test. 

It can be added that the reliability values calculated from the group 
ANOVAs on E and E^ give average item reliabilities of 0. 372 and 0. 528, 
respectively. An interesting feature is that the corresponding value for 
n is 0. 705. This can depend on two things. Since the reliability calcu- 
lated from the ANOVA is an intraclass correlation, it is only equal to 
the average correlation between repetitions if all 81 variances arc cqu?J. 
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Go different deviations from this may give the differ cnee between 0. 705 
and 0. 52C, even if correlations between repetitions arc, on the average, 
of the same magnitude for n and (E is not quite comparable here, 
as it is also dependent on the choice of k c ). The other cause is mere 
credible to me, and that is that n is more reliable than E^. One indi- 
cation of this is the high correlations between n and k c with an average 
of 0. 950. According to the classical theory such a correlation is a 
lower boundary of the geometric mean of the reliabilities of n and k c 
which, by the way, show that also k has a high reliability. I think it 
is reasonable to expect higher reliability on a choice variable than on 
a consequence variable. The latter is a transformation of the former, 
which sometimes (not here) involves unreliability in itself, c. g. at 
subjective judgments - of different hinds. But even when the transfor- 
mation can be mathematically defined and the choice variable has a 
retest correlation of i, the corresponding correlation for the conse- 
quence variable will only in special eases have the value i. 

The attentive reader has from the above already anticipated that the 
reliability of the sum score of the 27 items (situations) should be great 
and so it is. Two different types of values arc used: the general relia- 
bility of a composite measurement and one of its special eases, the 

2 

so-called Cronbach's alpha coefficient. I have used corrected R“ 
values as item reliabilities for the first type of values. The presenta- 
tion is, for each dependent variable and typo of value, in the following 
order: the decision test, the repetition 1, 2 and 3 of the experiment. 

The general values arc 0. 929, 0. 901 , 0. 940 and 0. 945 for E and for 
Eg 0. 963, 0. 962, 0. 979 and 0. 97C. The alpha coefficients arc 0. 885, 

0. 872, 0. 904 and 0. 899 for E and 0. 915, 0. 914, 0. 938 and 0. 927 for 
S ? . Vfc see the same picture for both types of values: E_, has higher 
reliability than E and the decision test has almost as high reliability 
as the experiment. The alpha coefficients arc smaller than the general 
values, which is the normal ease, since the alpha coefficient is the 
general value with average item variance of true score estimated by 
the average covariance bctv/ccn items. The latter value can never 
exceed the former value and the alpha coefficient is therefore, 
according to the classical reliability theory, a conservative measure 
which can be quite useless if the covariances arc small. However, 
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both types of values may be too high because of a violation of the assump 
tion that measurement errors of items are linearly independent. (I 
think that this assumption is more realistic for the experiment then 
for the decision test, ac every subject has its own sequence of items.) 

In the light of this fact, the alpha coefficient may be a more ,, rcliablc n 
reliability measure, since its conservatism may balance the above 
violation. Anyhow, the reliability of E and Eg is high. Finally, I can 
mention that the group ANOVAs give reliability values for sum scores 
of E, Eg an d n °f 0. 941 , 0. 968 and 0. 984, respectively. It can also 
be mentioned here that the distributions of the sum scores are more 
regular than the distributions of the single items. The distributions 
of the sum scores arc negatively skewed, but only slightly, with E 
having somewhat lower means and higher standard deviations than 
those of Eg. 

Other comparisons 

The comparisons of the decision test and the experiment arc based on 
means, standard deviations and correlations. For’E, the means of 
the decision tost and the three repetitions of the experiment arc 0, 744, 
0.695, 0. 71 5 and 0. 73 i , respectively. The corresponding values for 
E n become 0. 049, 0. 847, 0,847 and 0. 852. Remembering that the 
decision test was given after the experiment, we discover a time 
trend for E, but not for Eg. However, the differences arc small, 
not greater than 0.050, The average standard deviations for E arc 
0. 165, 0,191, 0, 184 and 0.179, while E R shows 0. 130, 0.143, 0.142 
and 0, 134. Again, we find time trends. Thus. E goes up with time 
and the group becomes more homogeneous for both E and Eg. The 
relationships between E and arc the same for the decision test 
as for the experiment. Concerning the correlations, something was 
already mentioned in connection with reliability. The average correla- 
tions between the decision test and the repetitions of the experiment 
arc 0. 228, 0. 240 and 0. 333 for E and 0. 305, 0. 294 and 0. 318 for Eg. 
The average correlations between repetitions arc (0. 488, 0. 441, 

0. 598) and (0.613, 0. 575 , 0. 676) for E and Eg , respectively. 

Results in accordance with factors D f P and C have already been 
discussed for the experiment, as far as means and standard devia- 
tions are concerned. No ANOVA results have been produced for the 
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decision test but the different means show the same pattern here as 
for the experiment, with the possible exception of PC for the depen- 
dent variable E. The equivalence is also valid for standard deviations, 
again with the exception of PC. The decision test shows the same 
kind of effect for E, although not so pronounced as in the experiment. 
For Eg, there is no PC. effect in the experiment, while the decision 
test lias an effect opposite to that for E: the most asymmetric situa- 
tions show the smallest standard deviations and the least asymmetric 
situations show the greatest standard deviations. Why this is so, is 
hard to say. As we have no corresponding effect for the standard de- 
viations of n, it may show that the symmetric situations arc less ro- 
bust to deviations from the optimal choice of n. This is true for d = 

0. 2 and 0. 3, but not for d = 0. 1 , and because the situations with d = 

0. 1 have less effect on Eg variation, due to robustness, it mav be 
generally true. 

The correlations show no uniformity at all. There arc different 
effects for the decision test and the experiment as well as for E and 
Eg a nd ^ will not be discussed. The average correlation between the 
experiment and the decision test is not very great but canonical corre- 
lation analyses (between the decision test and each of the repetitions) 
show that the correlations should not bo regarded as unessential. The 

first canonical correlations arc in tho neighbourhood of 1. 000 with 

2 

normal deviates of the x values of 6. 9, 7. 5 and 5. 0 for E and 13. 4, 

14. 1 and 12.2 for Eg. However, the analyses show some numerical 
instability due to many variables in relation to the number of indivi- 
duals, and this has also the effect of raising the greatest canonical 
correlations. I therefore sec little reason to discuss these analyses 
in detail. (A somewhat more reasonable analysis had been to find the 
correlations with the restriction of equal weight vectors, but no such 
program was available. ) 

For both E and Eg, factor analyses have been performed for each of 
the sets of 27 situations. This kind of factor analysis gives a principal 
axis solution and a varimax rotation, see Dixon (1967). The cornmuna- 
lity estimates arc squared multiple correlations and only factors with 
eigenvalues exceeding 1. 0 have been rotated. The analysis is not very 
satisfying, but no program for direct comparisons of structures was 
available. The dependent variable E gave 6 rotated factors for the deci- 
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sion test and 5 factors for each repetition. The variable E n gave 4 
rotated factors for the decision test and 5,4 and 4 for the repetitions 
of the experiment. The number of factors is reasonable for each ana- 
lysis, explaining between 0. 829 and 0. 863 of the total common varian- 
ce (estimated as the sum of the 27 squared multiple correlations). The 
a.vcragc absolute deviation of eigenvalues of successive unrota.tcd fac- 
tors between the decision test and the repetitions tells us something 
about the structures. (The surn comprises the five first unrota.tcd 
factors. ) We get 0. 37, 0. 26 and 0. 40 for E, but 1. 38, 0. 86 and 0. 90 
for Eg, thus indicating that the distributions of eigenvalues differ 
more for Eg. Corresponding values between repetitions are 0. 44, 

0. 50 and 0. 18 for E and 0. 60, 0. 52 and 0. 28 for E^ , also meaning 
that the decision test differs more for Eg. Similar results arc ob- 
tained for factor loadings of the unrotated factors. The average num- 
ber of loadings (for the first five factors), which differs more than 
0. 30, when corresponding values of two factor analyses arc compared, 
arc 5. 9 and 7. 1 for E ancl 10. 1 and 6. 7 for E^. The first figure re- 
fers to comparisons between the decision test and the experiment and 
the second one refers to comparisons within the experiment. The 
first factor shows better equivalence than the others, which arc not 
very similar. The factor analyses seem to show that the decision test 
is different from the experiment for E^, relative to the difference 
within the experiment. No attempts have been made to "interpret 11 the 
rotated factors. 

Intelligence and efficiency 

The intelligence tests show sufficient discriminating ability and have 
reasonable reliability. (See appendix 3. ) Compared to Holroquist's 
group, my subjects arc better when it concerns "factor" V, W and S 
and worse on "factor" F, where they also arc comewhat more homo- 
geneous. The differences arc probably due to age differences and to 
the fact that my university people arc a selected group of students 
leaving the gymnasium. The reliability estimates shown by Holmquist 
(1967) are, on the average, of the same magnitude as those which arc 
presented in appendix 3 in the column marked with r^. The estimate 
r j is a special Cronbach's alpha coefficient with the assumption of 
equal item difficulties, implying that the total mean and variance are 
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sufficient for estimating the reliability. Since only the total number 
of correctly answered items was punched for each subject and test, 
this reliability estimate was the only accessible one. However, it is 
known to be below the alpha coefficient to an extent which depends on 
the variance of the item difficulties, sec c.g. Horst (1966, p. 273). 

I have therefore also made estimates on the assumption that the item 
difficulties arc rectangularly distributed, which I believe is more 
reasonable than the assumption of zero variance. The new estimates 
arc shown in appendix 3 in the column marked with r^. Wo see that 
’'factor'' S has a somewhat higher average reliability, but almost all 
estimates arc reasonably high for group comparisons. 

Eight canonical correlation analyses have been performed between 
the ten intelligence tests and the 27 decision situations. For E as 
well as for Ep the four analyses comprise the decision test and the 
three repetitions of the experiment. All analyses show the same re- 
sult. Although the first canonical correlations are about 0. 900, their 
2 

corresponding x values arc not greater than those expected by 
chance. This is in line with .the magnitudes of the correlations bet- 
ween the intelligence tests and the decision situations. The i. 030 corrc 
lations for the analyses of E have -0. 299^ r <. 0. 486 and those for 
Ejj have -0. 299 r 0. 483. The S and the I tests have somewhat 
higher correlations than the other tests, but on the whole these 
correlation analyses cannot verify any substantial relations between 
intelligence and efficiency. 

Factor analyses have also been performed, of the same kind as 
was described before. When the intelligence tests arc analysed alone 
we get two rotated factors "explaining" 0. 941 of the total common 
variance. The first one is spatial and inductive, the other one being 
verbal (V and W). Eight further factor analyses were performed, each 
comprising the intelligence tests and one efficiency variable, the last 
one being a sum of 27 efficiency scores. (We have such a sum for 
the decision test and the three repetitions, partly for E and partly 
for E^. ) The addition of the efficiency variable docs not change the 
result of the intelligence variables very much. Thus there are always 
two factors, which "explain" about 90 per cent of the total common 
variance. The only difference concerns the repetitions, where the 




T 



7 





1 

2 



j 

j: 

i 

| 

i 

\- 

s’ 

i 

f 

t 

< 

l 

i 




- 36 - 

spatial -inductive factor becomes purely spatial. 

The communality estimates for the efficiency variable are low 
2 

with 0. 137 < R < 0. 340, while the corresponding values for the 
2 ”” — — 2 

average R show 0. 417 < R“ < 0. 444 for the eight analyses. The only 

2 “ 

intelligence test, where R is raised when the efficiency variable is 
added as a tenth independent variable, is one of the spatial test, 
showing an average increase 0. 004 for the decision test and 0. 048 for 
the experiment. The average correlation between these variables arc 
0. 177 and 0. 400, respectively. As could be expected, the only sub- 
stantial loading of the efficiency variable is for the spatial factor, 
with loadings between 0. 242 and 0. 465. Thus the result of the canoni- 
cal correlation analyses reappears: the efficiency of decision making 
is not very dependent on intelligence, with the possible exception of 
spatial ability, and this is valid for both E and E_. I do not even know 
if it is a purely spatial test: some of those who have used this test 
assert that it is very frustrating to the subject and also is an endurance 
test. 
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average R show 0, 417 < R“ < 0. 444 for the eight analyses. The only 
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intelligence test, where R is raised when the efficiency variable is 
added as a tenth independent variable, is one of the spatial tost, 
showing an average increase 0. 004 for the decision test and 0. 048 for 
the experiment. The average correlation between these variables arc 
0. 177 and 0.400, respectively. As could be expected, the only sub- 
stantial loading of the efficiency variable is for the spatial factor, 
with loadings between 0. 242 and 0. 465. Thus the result cf the canoni- 
cal correlation, analyses reappears: the efficiency of decision making 
is not very dependent on intelligence, with the possible exception of 

spatial ability, and this is valid for both 12 and . I do not even know 

15 

if it is a purely spatial test: some of those who have used this test 
assert that it is very frustrating to the subject and also is an endurance 
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Only the experiment is discussed in this section, although some com- 
ments arc also applicable to the decision test. The items of this test 
have, on the average, similar values for reliabilities, means and 
standard deviations as do the situations of the experiment. The corre- 
lations between corresponding items and situations are somewhat 
lower than correlations between situations from repetition to repeti- 
tion. However, regarded as an instrument which gives an efficiency 
sum. score it is about as good as one repetition of the experiment - 
and much cheaper. The efficiency seems to be rather unrelated to 
intelligence as that is defined here, with the possible exception of 
spatial ability. 

The lens model 

I thought that this study was a Bayesian study, but after reading Slovic 

& Lichtenstein (1971) I know better. The content of the study is, of 

course, a Bayesian one, but, according to their excellent paper, the 

approach of the study is mainly a regression approach. B runs wile's 

lens model is more or less applicable to the treatment of data of this 

report (see appendix 4). The stimulus dimensions or cues arc d, 

Cq/Cj and F(Hq), which by suitable dummy variable coding give rise 

to seven effects. The correlations between independent and dependent 

variables, for a subject or r, group of subjects, called utilisation co- 

2 

efficients, arc squared here and denoted u , which arc squared 
multiple correlations between the dependent variable and the dummy 

variables defining an effect. Due to orthogonality the squared consis- 

2 2 
tcncy index r“ is then equal to the sum of the seven o . Low r 
s s 

values arc said to show inconsistency. You could just as well call it 
irrelevance, because low values may mean that the subject uses 
other cues than those which the experimenter thinks he is presenting. 
Both "explanations" can be more or less correct simultaneously. On 
the criterion side the corresponding utilisation coefficients arc called 
ecological validites, and an index of the environmental predictability 

r is the correspondence cf the consistency index. In this study the 

C 2 
squared validities are the different u values of the statistical model 

and the sum of these values stands for the squared index of the environ 

mental predictability. Neither the achievement index r^ nor the 

matching index r is calculated as the lens model prescribes. 



The application of tlio total lens model is only meaningful for the 

dependent variable n. Here r is 1. 0 so that the lens model equation 

c 1 

degenerates to r = r r , which means that squared achievement index 
° asm’ 2 

has the sum of the subject's seven w c values as its upper bound. E^ 
could be regarded as an analogue to r , and a better one, because r 

■— d. 

is not sensitive to mean differences of n and is an index for choice 
variables here. The subject can very well rank n (for the 27 situa- 
tions) in the same way as the statistician docs and still has low effi- 
ciency. The opposite may also be true in certain eases: due to small 
variation of n and robustness we can obtain low r values and high E„ 
values. The dependent variables E and Ep are themselves used in 
ANOVA, but the lens model is only half here, since the efficiency is 
always i. 0 for the statistician the criterion side collapses (it is al- 
ready comprised by the dependent variables). The fourth dependent 
variable, lc , cannot, as far as I can see, be used within the scope of 
the lens model, partly because it depends on n and partly because it 
is non -numerical when n is zero. 

When using the full lens model the statistician is regarded as the 
criterion with r =1.0. This is not necessary, e. g. having the same 

statistician making sequential observations will produce a r value 

c 

less than 1. 0. The criterion can of course also be other things than 
a. statistical model. The most used alternative is "true 1 ' data - obser- 
vations from a follow-up study - but it can just as well be constituted 
by observations with another response method or the result of another 
subject or a group of subjects. And there is nothing that prevents you 
from having a second model on the subject side, thus using the lens 
model to compare two models. Nor arc there any obstacles to gene- 
ralizing the lens model to a multivariate model, although there will 
be problems, as for other multivariate models, of creating convonicnt 
indices. 

The statistical model is used in two ways in this study. For n, it is 
used on the criterion side of the lens model (and something like that 
for k c , too), while for E and E_, the model is used to evaluate the 
choice variables (to construct the consequence variables). V/c can 
say that n is evaluated twice. First by using the lens model to compare 
the utilization coefficients with the ecological validities (the choice 
level), second by calculating E^ and looking on its utilization cocffi- 
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cicnts (the consequence level). The fact that the full lens model does 

not work with E^ or E (or the often used accuracy ratio) is not a gene. 

ral property for a consequence variable. For instance, it will work 

with Rg and R. However, the full lens model will, for some eases, 

collapse when the criterion side is occupied by the same entity as 

that which is used for the construction of the consequence variable. 

Its values for this entity are then the same constant for all situations. 

This will in general not happen if one uses different entities for the 

two purposes, e. g. if the criterion side is represented by a special 

subject and the statistical model is only used to "Ct E and E . 

1 ° B 

Results 



The discussion here builds on the paper by Slovic Si Lichtenstein (1971). 
This is hardly any restriction, siiice this paper broadly reviews much 
research in the Bayesian area and other kind of research using the 
lens model. Although the paper is almost only concerned with proba- 
bility (revision) experiments - and not with information- seeking ex- 
periments, which this study is - some concepts and results can still 
be applied and discussed here, at least in connection with the choice 

variables k and n. 
c 

One of the key concepts in Bayesian research is that of conserva- 
tism. Apparently this word means different things to different re- 
searchers. For the one who performs a probability revision experi- 
ment it means that the subject makes too crncill a revision in covnpari- 
son with the prescription of Bayes's theorem, and this can be measured 
in several v/ays. Others have used VTT r^ or compared the utiliza- 
tion coefficients with tile ecological validities. The crucial issue is 
whether you will define conservatism as a measure of distance or as 
a measure of (co)varinncc. Take factor D for the dependent variable 
n as an example. The statistician has means 7. 4, ?,9. 7 and 24. 3 for 
increasing d values, while the corresponding means for the total group 

O 

of subjects are 23. 2, 20. 6 and 19. 5. For the statistician u “1 is 0. 676 
and for the average subject it is 0. 134. The distance measure shows 
that this subject is conservative for d = 0. 1 and radical otherwise. 

The variance of the means of the subject is less than that of the sta- 
tistician, so from this point of view the subject is conservative, 

2 

Comparing tu values will also result in conservatism here. As you 
can sec no choice can tell the whole story and different indices can 



O 



- 40 - 



classify subjects differently. My choice is to define the decree of con- 
servatisin as w (statistician)/ w (subject), for a certain effect. This 
implies that conservatism, is defined as a lack of relative variance. If 
’/o'- 1 like it, you may also say that conservatism, for an effect, means 
lesser diagnosticity than the model prescribes. The above ratio should 
only be of interest when w (subject) is sufficiently high. Although a 
ratio of 10 indicates a considerable conservatism, I hesitate to find it 
essential if we have e. g. 10 = 0. 020/0. 002. 

In trying to explain conservatism one has used the labels misper- 
c option , misaggi egation and response bias. Micpcrc option means 
subjective transformation of the cue values, misagg rogation means 
tjiat the subject c policy for using the cues in order to generate a value 
of the dependent variable is deviant from the model while response 
bias can involve such tilings as sensibility to different response modes 
and the range of the cue values. It is not often that experiments arc 
designed to differentiate between these explanations, and like any 
other information-seeking experiments which I know of this study 
cannot differentiate between the possible explanations. Of course, 
this docs not prevent you from discussing them. 

Although trivial, it is pemaps best to underline that conservatism 
as well as its explanations are relative concepts. While a subject may 
be conservative versus one model (or another subject) he may be 
radical in comparison with another model (or a third subject), and 
while one model classifies your judgments as misperceptions, another 
one may call them misagg rogations , or both. I think that for every 
consistent behaviour you can construct a model which, on the average, 
describes this behaviour. This is not very interesting as it presumably 
means one model for each subject. However, the models and thereby 
the subjects can be clustered according to certain properties to obtain 
more general Knowledge. (Analogous ways have been tried, which in 
this ease could have involved a data matrix of order 60 x 7 with the 
seven io values as variables. Seme kind of method for latent struc- 
ture analysis could then be used to cluster the subjects into subgroups 
of similar to piofilcs. ) Instead of doing this very extensive labour 
tne researcher chooses prior models with which he compares the sub- 
jective behaviour. Different camps of researcher have different such 
models and therefore can have different explanations of "deviant" 
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behaviour. It may thus be v/ioe not to cay c. g. misaggregation but 
misaggregation in comparison with a Bayes strategy. 

The ecological validities for n are small for four effects, very 
high for D and noticeable for PC and DPC. According to our defini- 
tion, all subjects show conservatism for D and all but one for DPC, 
too. On the other hand, several subjects show radicalism, especially 
for P but we may also mention PC. This is not in line with SI o vie C: 
Lichtenstein (1971), who report that interaction effects have small 

increments in predictive power. Here, 17 out of 60 subjects have 

2 

results such that 0. 100 < <L 1. 000. It is also said that the most 

important cue usually accounts for more than 40 per cent of the pro- 
dictable variance (max ( u )/£w ) and 1 can somewhat agree with 

it. Twenty subjects show this result, seven allocated on D, seven on 
P, one on C and five on PC. However, the statement that the three 
most important cues usually cover more than 80 per cent cannot be 
confirmed. This is only valid for five subjects and the statistician. 

Thus, the majority of the subjects is not focusing on a single cue and 
they have quite varying squared consistency indices, 0. 160 r < 
i. 000 with a mean of 0. 596, with very different u> " profiles. 

The most remarkable feature about the choice of k is the asymme- 
try. Most subjects do not “like" Hq, at least for large n, and more 
or less consequently choose k loss than kj,, the most extreme choice 
being k =10 for n = 100, This implies that there is a tendency for 
several subjects to choose k more extreme than k_ for situations 

C L> 

with k E -< n/2. This does not seem to be in line with the mainstream 
of results either. Slovic S: Lichtenstein (1971) say that subjects are 
never as sensitive to the experimental conditions as they ought to be 
for the Bayesian research they have summarised. However, the ex- 
tent to which this statement does not hold in this study is dependent 

on the choice of a criterion. For instance, there are 45 subjects with 

2 

B <0. 5 (k = A -I* Bn), but there arc only 20 subjects having s (k )/ 

2 c c 

s“(n) greater than the corresponding ratio for the model. 

I have earlier in this paper suggested that the asymmetry of k 
may be caused by an asymmetry of the apprehension of 0 and 1. This 
gives rise to a misperception of the binomial frequency function, which 
has been experimentally verified before. If the outcome 1 has a greater 
impact than the outcome 0 we will get B <. 0. 5, provided no misaggrega- 
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tions occur, that is, tlic subject uses the Bayes strategy within his 
subjective apprehension (misperception) of the binomial frequency 
function. Another alternative for misperception: suppose that the 
subject docs :iot quite trust the data. This can generate reliability 
models like those in Churn f: DuCharmc (1971), which perhaps can 
be used to "c::plain" the asymmetry of h c> I also mentioned earlier 
(p. 22 ) that the response mode can have caused the bias. 

The above examples of misperception may also be used as a des- 
criptive model for some subjects and perhaps also describe some 
subjects' choices of both k and n. But, in comparison with the 
Bayesian model, subjects most likely also misaggregate cues when 

choosing k and n. (Provided no mi spcrccption , this is e. g. rcflcc- 
C 2 

ted by the a) profile. ) The situations arc complex and I think that 
the subjects simplify reality by creating simple rules. These can be 
followed more or less rigorously. A few subjects have rules, which 
I can see have been followed all the way, c. g. "I never make any ob- 
servations" and "Regardless of cl, I make 10 observations when P(Hq) 
and Cq/cj, balance each other (e. g. Hq improbable but cheap) and 
make no observations in other eases". Examples of rules which are 
almost always followed are "If c^/c^, is i. 0 I will make 10 observa- 
tions, otherwise I make 20 observations," and "If F(Hq) is 0. 3 I will 
make 20 observations, otherwise I make 10 observations". Then 
there may be more stochastic rules like "I always make between 10 
and 40 observations, but for every trial I just guess". 

This scattered picture can make you rather pessimistic about ever 
finding any general results. It is quite clear that it is very dangerous 
to present only group results. As the individual strategics .vary con- 
siderably you can get almost arbitrarily varying group results by 
changing the group composition. \7c may also remember that group 

results can give peculiar results in comparison to the individual rc- 

2 

suits, due to lack of c ommuta'oilit y . For instance, u for the average 

2 

subject derived from the group ANOVA is not equal to the average <i 
calculated from the individual ANOVAs. Not knowing such properties 
can generate more or less unreasonable conclusions. It is easily 
done, because I believe that most of us try to look upon the world 
as simply as possible, perceive situations as symmetric, commuta- 
tive, full of linear relations and so on. 
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One bold solution to this multitude of behaviour. is to neglect it. 

For instance, who cares about conservatism or radicalism if the 
efficiency of the resulting decision is high? Although this is an e::- 
treme opinion there is a kernel of truth in it. It is somewhat strange 
that, while decision theory itself preaches that it is the consequences 
which count, researchers on human decision making usually concen- 
trate on choice variables. I think we can add another dimension to the 
discussion of deviant behaviour if we also consider its consequences 
when possible. Psychologists naturally have an interest in choice' 
variables, but from an "economic" viewpoint these will only be of 
importance for situations where they indicate non -optimal behaviour 
of low efficiency. 

The consequence variables E and E p have, on the average, higher 

consistency than the choice variable n. Y/e obtain 0. 291 <- r < 0. 979 

? ^ 

with a mean of 0. 718 for E and 0. 304 — r c 1. 000 with a mean of 

— s — 

0. 773 for Ej,. Although Ep, lias a higher mean than E, it is not so for 
every subject. V/c also have r“(E_) = i. 000 if r (n) = 1. 000 or when 

D S 

the subject alv/ayc makes the same number ox observations (in which 
ease r o (n) is not defined). However, I do not know whether, for two 
subjects i and j, r„(iv)< r o (n.) implies r^(E p .) c. r (E n .). Probably 

J ^ J JL> 1 Sj Dj 

not. As for n, the u' profiles for E and Ep, arc very different from 
subject to subject, but factor D has the largest average utilization 
coefficients and only DPC for E_, lias also an average « above 0. 100. 
Speaking about consequences , D is the most important cue for most 
subjects, but its different levels are not of the same interest. It is, 
above all, d = 0. 3 which tests the subjects, while d = 0. 1 has low 
discriminating power (or high robustness). The latter situation is 
analogous to dealing with intelligent persons: no matter how you 
teach them, some elementary material will they learn. 

I do not know how common situations, as those with d = 0. 1 arc, 
but I believe that a great many situations can be described by crite- 
rion functions which are flat around its optimal point. On the other 
hand, there arc definitely situations where choices are crucial. My 
proposal is that more experiments should be designed with the latter 
kind of situations. This may not be easy, but it will add an importance 
to the choice variables they often not have today. 
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APPENDICES 



Appendix 1 . Choices and expected losses of the statistical model. 
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'When n o * 0 the hypothesis chosen is indicated for kg. The unit of is one 
Swedish dra, which for this experiment constitutes one tenth of the cost of one 
observation. 
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Appendix 2. The distribution of subjects on i'acuitv , sox and 'X-;n 



1 

Faculty of | 


i Age 

i Sex I '-9-24 j 25-29 


3c -45 


1 r M - 1 J 1 

Humanities 1 

1 

J 


Female I 

i die i 


? ! i' 

2 1 1 


* J 

* i 


Social sciences 


Female 

Male 


7 ! ft 

? 0 

!-. i 


0 

•3 

I i 


Natural sciences 


Female 

Male 


r 

1 0 
i 7 ! 3 


i 1 i 

0 j 

o i 

i 



Humanities: * ^ 

Social science?: 23 

Natural sciences: : ■ 

Female: 2S 

Male: 32 

19-24: 3i 

25 - 29 : 22 

30-45: 7 
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Appendix 3. The intelligence tests. 
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The tests V ^ and V ^ concern verbal understanding, one is a test on aynonymns 
and the other contains items on verbal analogies. The next two tests, W, and 
are teats on verbal fluency. The task of Wj is to write as many words as 
possible, which begin with "s" and end with "a", while the task of concerns 
words which end with "al". The test I. and I , measure inductive reasoning and 
the items of both tests are series of numbers for which a new number should 
be adaed. S ^ and are spatial tests, the items of which are three-dimensio- 
nal bodies unfolded in two dimensions and the task is to say something about 
their three-dimensional forms.. The final tests, Pj and P^, are supposed to 
measure the perceptual factor. One of them has to do with sorting and the 
other concerns coding. 

Regarding the columna of the above table, m stands for the arithmetic 
mean, a ia the standard deviation, r { is a specialised Crohbach's alpha coef- 
ficient (also known as Kuder -Richardson's formula 2 1) and r ^ is a special 
estimate of Cronbach's alpha coefficient as it ia discussed on page 35. 
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Appendix 4. The lena model (After Slovic & Lichtenstein, I 97 1 ). 
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is 



le 



r • 
m 

c: 



Utilization coefficient for cue x. 

i' 

Ecological validity for cue x^. 
Consistency index. 

Index of environmental predictability. 
Achievement index. 

Matching index. 

Defined as r / ^ x/ N 

(y 0 -y 0 )(y,-y,). 



Both y g and are linear regression functions of the cue values. 

For ANOVA, r is equal to the total correlation, r is a between- 
a m 

cells correlation and c is a within-cells correlation, while r and r 

s c 

are to between cells, as described in the Final comment. 
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Appendix 5. Symbols used frequently. 
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A 

B 

C 

c 0 

c i 

D 

d 



E 





k 



k 



B 



m 

11 




w 



P 

P(.) 



P0 

Pi 



The intercept of k = A + Bn. 

1 c 

The regression coefficient of k = A + Bn. 

Factor of cost ratios Cq/c^ with levels 0. 5 , 1.0 and 2. 0. 

The loss generated by a wrong choice of Hq. 

The loss generated by a wrong choice of H^. 

Factor of d values with levels 0. 1 , 0. 2 and 0. 3. 

Diagnosticity of data, defined as d = Pj-Pn* 

The efficiency of the choices of k c and n, defined as E = R(kg, 
n 0 )/ R ( k c * ") or > shorter, R. q /R. 

The efficiency of the choice of n, defined as E^, -• R(k^, , n Q )/ 

R(k B , n) or, shorter, R Q / R B - 

The null hypothesis p = Pq 

The alternative hypothesis p = p^. 

The number of ones of n observations. 

The critical value of k according to the statistical model. It 
chooses K a if k < k„ and H« otherwise. (The k value of the 
Bayes strategy. ) 

The critical value of lc chosen by a subject. He chooses Hq if 

k ■£ k and I-I. otherwise, 
c l 

The arithmetic mean. 

The number of observations (chosen by a subject). 

The number of observations according to the statistical model. 

2 

Hays' , defined as 5S. /“tow Io r an effect i. It is a squared 
multiple correlation between the dependent variable and the 
dummy coded variables defining the effect. 

Factor of prior probabilities P(Hq) with levels 0. 3, 0. 5 and 0. 7. 
The probability of something. Especially, P(Hq) is the prior 
probability of Hq (before sampling) and P (I-Iq^c, n) is the poste- 
rior probability of Hq (after sampling, when k and n arc known). 
The probability that an observation will have the outcome 1 
(according to Hq). 

The probability that an observation will have the outcome 1. 
(according to I-I.). 

The squared multiple correlation. 
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R(k B , n) 55 Rg The total expected loss of choosing n observations 
and using the critical value k^. 

R(k c > n ) = R The total expected loss of choosing n observations 
and using the critical value k . 

R(kg , n Q ) = R q The total expected 1 osc of choosing n Q observations 
and using the critical value ic^. 

r The product-moment correlation . 

S Factor of the subjects with 60 levels . 

3G Sum of squares. 

s The standard deviation. 

T Factor of replications with three levels. 
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